24 research outputs found

    Modelling transcriptional regulation with Gaussian processes

    Get PDF
    A challenging problem in systems biology is the quantitative modelling of transcriptional regulation. Transcription factors (TFs), which are the key proteins at the centre of the regulatory processes, may be subject to post-translational modification, rendering them unobservable at the mRNA level, or they may be controlled outside of the subsystem being modelled. In both cases, a mechanistic model description of the regula- tory system needs to be able to deal with latent activity profiles of the key regulators. A promising approach to deal with these difficulties is based on using Gaussian processes to define a prior distribution over the latent TF activity profiles. Inference is based on the principles of non-parametric Bayesian statistics, consistently inferring the posterior distribution of the unknown TF activities from the observed expression levels of potential target genes. The present work provides explicit solutions to the differ- ential equations needed to model the data in this manner, as well as the derivatives needed for effective optimisation. The work further explores identifiability issues not fully shown in previous work and looks at how this can cause difficulties with inference. We subsequently look at how the method works on two different TFs, including looking at how the model works with a more biologically realistic mechanistic model. Finally we analyse the effect of more biologically realistic non-Gaussian noise on the biologically realistic model showing how this can cause a reduction in the accuracy of the inference

    Sparse Bayesian variable selection for the identiļ¬cation of antigenic variability in the Foot-and-Mouth disease virus

    Get PDF
    Vaccines created from closely related viruses are vital for oļ¬€ering protection against newly emerging strains. For Foot-and-Mouth disease virus (FMDV), where multiple serotypes co-circulate, testing large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross- protection between strains is important to help optimise vaccine choice. Here we describe a novel sparse Bayesian variable selection model using spike and slab priors which is able to predict antigenic variability and identify sites which are important for the neutralisation of the virus. We are able to iden- tify multiple residues which are known to be key indicators of antigenic variability. Many of these were not identiļ¬ed previously using frequentist mixed-eļ¬€ects models and still cannot be found when an ā„“1 penalty is used. We further explore how the Markov chain Monte Carlo (MCMC) proposal method for the inclusion of variables can oļ¬€er significant reductions in computational requirements, both for spike and slab priors in general, and our hierarchical Bayesian model in particular

    Sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution

    Get PDF
    Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation

    Improving the identification of antigenic sites in the H1N1 Influenza virus through accounting for the experimental structure in a sparse hierarchical Bayesian model

    Get PDF
    Understanding how genetic changes allow emerging virus strains to escape the protection afforded by vaccination is vital for the maintenance of effective vaccines. We use structural and phylogenetic differences between pairs of virus strains to identify important antigenic sites on the surface of the influenza A(H1N1) virus through the prediction of haemagglutination inhibition (HI) titre: pairwise measures of the antigenic similarity of virus strains. We propose a sparse hierarchical Bayesian model that can deal with the pairwise structure and inherent experimental variability in the H1N1 data through the introduction of latent variables. The latent variables represent the underlying HI titre measurement of any given pair of virus strains and help to account for the fact that, for any HI titre measurement between the same pair of virus strains, the difference in the viral sequence remains the same. Through accurately representing the structure of the H1N1 data, the model can select virus sites which are antigenic, while its latent structure achieves the computational efficiency that is required to deal with large virus sequence data, as typically available for the influenza virus. In addition to the latent variable model, we also propose a new method, the blockā€integrated widely applicable information criterion biWAIC, for selecting between competing models. We show how this enables us to select the random effects effectively when used with the model proposed and we apply both methods to an A(H1N1) data set

    Optimising outputs from a validated online instrument to measure health-related quality of life (HRQL) in dogs

    Get PDF
    Measurement of health-related quality of life (HRQL) is becoming increasingly valuable within veterinary preventative health care and chronic disease management, as well as in outcomes research. Initial reliability and validation of a 22 item shortened version of VetMetrica (VM), structured questionnaire instrument to measure HRQL in dogs via a mobile application was reported previously. Meaningful interpretation and presentation of the 4 domain scores comprising the HRQL profile generated by VM is key to its successful use in clinical practice and research. Study one describes transformation of domain scores from 0ā€“6 to 0ā€“100 and normalisation of these based on the healthy canine population in two age ranges, such that a score of 50 on a 0ā€“100 scale represents the score for the age-related average healthy dog, and establishment of a threshold to assess domain-specific health status for individual dogs. This provides the clinician with a simple method of ascertaining the health status of an individual dog relative to the average healthy population in the same age group (norm-based scoring). Study two determines the minimum important difference (MID) in domain scores which represents the smallest improvement in score that is meaningful to the dog owner, thus providing the clinician with a means of recognising what is likely to be a significant improvement in scores for an individual dog over time. Visual representation of these guidelines for the purpose of interpreting VM profile scores is presented using case studies

    Optimisation of scores generated by an online feline healthā€“related quality of life (HRQL) instrument to assist the veterinary user interpret its results

    Get PDF
    Using methodology previously described for the dog health-related quality of life (HRQL) tool (VetMetricaā„¢), the aim was to optimize the scores profile of a comparable feline online HRQL instrument for monitoring HRQL in cats, to assist in its interpretation. Measuring HRQL helps quantify the impact of disease and its treatment on well-being, aids clinical decision making and provides information in clinical trials. In Study 1, using data collected from previous studies, scores generated for three domains of HRQL (Vitality, Comfort, Emotional Well-being) in healthy cats were normalized using standard statistical techniques of logit transformation and T-scores, such that the average healthy cat has a score of 50 in all three HRQL domains. Using normalized scores from healthy and sick cats, a threshold score of 44.8 was determined, above which 70% of healthy cats should score. Study 2 determined the Minimal Important Difference (MID) in normalized score that constituted a clinically significant improvement in each domain. Three methods were tested in order to determine the MID, with the final choice made based on statistical and clinical considerations. Thresholds of 5, 7.5, and 5 were chosen for the three HRQL domains representing Vitality, Comfort and Emotional Well-being, respectively. This study makes available a means of displaying HRQL scores from an online application in an easily interpretable manner and quantifies a clinically meaningful improvement in score. To illustrate the practical application of these developments, three case examples are presented. Example 1 illustrates the raw and normalized scores for a group of overweight cats enrolled in a Feline Weight Management Programme. Example 2 shows three groups of osteoarthritic cats, each with different severity of disease. The third is an elderly, un-well cat whose HRQL was recorded over time, specifically to facilitate end of life discussion between owner and veterinary clinician

    In silico optimization of mass spectrometry fragmentation strategies in metabolomics

    Get PDF
    Liquid chromatography (LC) coupled to tandem mass spectrometry (MS/MS) is widely used in identifying small molecules in untargeted metabolomics. Various strategies exist to acquire MS/MS fragmentation spectra; however, the development of new acquisition strategies is hampered by the lack of simulators that let researchers prototype, compare, and optimize strategies before validations on real machines. We introduce Virtual Metabolomics Mass Spectrometer (ViMMS), a metabolomics LC-MS/MS simulator framework that allows for scan-level control of the MS2 acquisition process in silico. ViMMS can generate new LC-MS/MS data based on empirical data or virtually re-run a previous LC-MS/MS analysis using pre-existing data to allow the testing of different fragmentation strategies. To demonstrate its utility, we show how ViMMS can be used to optimize N for Top-N data-dependent acquisition (DDA) acquisition, giving results comparable to modifying N on the mass spectrometer. We expect that ViMMS will save method development time by allowing for offline evaluation of novel fragmentation strategies and optimization of the fragmentation strategy for a particular experiment
    corecore